

**WHAT IS CLAIMED IS:**

1. A decode unit coupled to receive instruction bytes, the decode unit coupled to dispatch instructions to an execution subsystem, wherein the decode unit comprises circuitry divided into a pipeline including a plurality of pipeline stages, the circuitry configured to concurrently initiate decode of a plurality of instructions, wherein the circuitry is configured to dispatch at least an initial instruction of the plurality of instructions from a first pipeline stage of the plurality of pipeline stages, and wherein the circuitry is configured to dispatch at least one remaining instruction of the plurality of instructions from a second pipeline stage of the plurality of pipeline stages, and wherein the second pipeline stage is subsequent to the first pipeline stage in the pipeline.  
10
2. The decode unit as recited in claim 1 wherein the circuitry is configured to dispatch each remaining instruction of the plurality of instructions from the second pipeline stage.  
15
3. The decode unit as recited in claim 2 wherein the second pipeline stage is a last stage of the pipeline.
4. The decode unit as recited in claim 1 wherein the decode unit is configured to dispatch instructions into a plurality of positions input to the execution subsystem, wherein the plurality of positions are indicative of a program order of the instructions concurrently dispatched into the plurality of positions.  
20
5. The decode unit as recited in claim 4 wherein the circuitry is configured to dispatch the initial instruction into a last position of the plurality of positions, the last position being ordered subsequent to each other one of the plurality of positions.  
25
6. The decode unit as recited in claim 5 wherein the circuitry is configured to dispatch the remaining instructions into other ones of the plurality of positions.

7. The decode unit as recited in claim 6 wherein the decode unit is coupled to receive a second plurality of instructions subsequent to the plurality of instructions, and wherein the circuitry is configured to dispatch at least a second initial instruction of the second plurality of instructions into one or more of the last positions concurrent with dispatching the remaining instructions.

5 8. The decode unit as recited in claim 4 wherein the circuitry is configured to dispatch a plurality of initial instructions from the first pipeline stage into a plurality of last positions  
10 of the plurality of positions.

9. The decode unit as recited in claim 1 wherein the plurality of instructions are variable length.

15 10. The decode unit as recited in claim 1 wherein the circuitry includes a first circuit operable on the initial instruction in a third stage of the plurality of pipeline stages, the first circuit configured to perform a first portion of decoding the initial instruction, and wherein the circuitry further includes a second circuit operable on the remaining instruction in a fourth pipeline stage of the plurality of pipeline stages, the fourth circuit  
20 configured to perform the first portion of decoding the remaining instruction.

11. A processor comprising:

a decode unit coupled to receive instruction bytes, wherein the decode unit  
25 comprises circuitry divided into a pipeline including a plurality of pipeline stages, the circuitry configured to concurrently initiate decode of a plurality of instructions, wherein the circuitry is configured to dispatch at least an initial instruction of the plurality of instructions from a first pipeline stage of the plurality of pipeline stages, and wherein the circuitry

is configured to dispatch at least one remaining instruction of the plurality of instructions from a second pipeline stage of the plurality of pipeline stages, and wherein the second pipeline stage is subsequent to the first pipeline stage in the pipeline; and

5

an execution subsystem coupled to receive the initial instruction and the remaining instruction and configured to execute the initial instruction and the remaining instruction.

10 12. The processor as recited in claim 11 wherein the circuitry is configured to dispatch each remaining instruction of the plurality of instructions from the second pipeline stage.

13. The processor as recited in claim 12 wherein the second pipeline stage is a last stage of the pipeline.

15

14. The processor as recited in claim 11 wherein the decode unit is configured to dispatch instructions into a plurality of positions input to the execution subsystem, wherein the plurality of positions are indicative, to the execution subsystem, of a program order of the instructions concurrently dispatched into the plurality of positions.

20

15. The processor as recited in claim 14 wherein the circuitry is configured to dispatch the initial instruction into a last position of the plurality of positions, the last position being ordered subsequent to each other one of the plurality of positions.

25 16. The processor as recited in claim 15 wherein the circuitry is configured to dispatch the remaining instructions into other ones of the plurality of positions.

17. The processor as recited in claim 16 wherein the decode unit is coupled to receive a second plurality of instructions subsequent to the plurality of instructions, and wherein

the circuitry is configured to dispatch at least a second initial instruction of the second plurality of instructions into one or more of the last positions concurrent with dispatching the remaining instructions.

- 5 18. The processor as recited in claim 11 wherein the decode unit is configured to dispatch a plurality of initial instructions from the first pipeline stage into a plurality of last positions of the plurality of positions.
- 10 19. The processor as recited in claim 11 wherein the plurality of instructions are variable length.
- 15 20. The processor as recited in claim 11 wherein the circuitry includes a first circuit operable on the initial instruction in a third stage of the plurality of pipeline stages, the first circuit configured to perform a first portion of decoding the initial instruction, and wherein the circuitry further includes a second circuit operable on the remaining instruction in a fourth pipeline stage of the plurality of pipeline stages, the fourth circuit configured to perform the first portion of decoding the remaining instruction.
- 20 21. A decode unit coupled to receive instruction bytes, the decode unit coupled to dispatch instructions into a plurality of positions input to an execution subsystem, wherein the plurality of positions are indicative, to the execution subsystem, of a program order of instructions concurrently dispatched into the plurality of positions, wherein the decode unit is configured to concurrently initiate decode of a plurality of instructions, and wherein the decode unit is configured to decode one or more initial instructions of the plurality of instructions with a first decode latency and to dispatch the one or more initial instructions into one or more last positions of the plurality of positions, the last positions being ordered subsequent to each of the other positions of the plurality of positions, and wherein the decode unit is configured to decode at least one remaining instruction with a second decode latency and to dispatch the remaining instruction into a different one of the

plurality of positions.

22. The decode unit as recited in claim 21 wherein the decode unit is coupled to concurrently receive a second plurality of instructions, and wherein the decode unit is
- 5 configured to decode a second one or more initial instructions of the second plurality of instructions and to dispatch the second initial instructions into the one or more last position concurrently with dispatching the remaining instruction into the different one of the plurality of positions.